{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.1 supervised classification\n",
    "- 文本 -> 特征提取，建模表示 -> 机器学习模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Gender Identify\n",
    "-  特征:  男女名字特点，a,e,i结尾的是女性，K,o,r,s结尾的是男性\n",
    "    - 特征使用简单的类型编码，易于计算"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from nltk.corpus import names\n",
    "import random"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import nltk\n",
    "# nltk.download()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'last_letter': 'k'}"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def gender_features(word):\n",
    "    return {'last_letter':word[-1]}\n",
    "gender_features('Shrek')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "names = ([(name,'male') for name in names.words('male.txt')] + [(name,'female') for name in names.words('female.txt')])\n",
    "random.shuffle(names)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n",
    "# 特征提取\n",
    "featruesets = [(gender_features(n),g) for (n,g)in names]\n",
    "train_set, test_set = featruesets[500:],featruesets[:500]\n",
    "\n",
    "#  模型训练\n",
    "classifier = nltk.NaiveBayesClassifier.train(train_set)\n",
    "\n",
    "#  模型测试\n",
    "print(classifier.classify(gender_features(\"Neo\")))\n",
    "print(classifier.classify(gender_features(\"Trinity\")))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.778"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#  测试集合评估这个分类器\n",
    "nltk.classify.accuracy(classifier,test_set)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Most Informative Features\n",
      "             last_letter = 'a'            female : male   =     36.8 : 1.0\n",
      "             last_letter = 'k'              male : female =     31.9 : 1.0\n",
      "             last_letter = 'p'              male : female =     18.8 : 1.0\n",
      "             last_letter = 'f'              male : female =     14.7 : 1.0\n",
      "             last_letter = 'v'              male : female =     10.6 : 1.0\n"
     ]
    }
   ],
   "source": [
    "#  区分哪些特征对于区分名字是最有效的， 以a结尾的名字是男性的38倍。似然比来比较不同特征结果关系\n",
    "classifier.show_most_informative_features(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from nltk.classify import apply_features\n",
    "train_set = apply_features(gender_features, names[500:])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 选择正确的特征\n",
    "- 并不是特征越多越好，容易在小数据集上过拟合，泛化效果不佳"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'count(a)': 0,\n",
       " 'count(b)': 0,\n",
       " 'count(c)': 0,\n",
       " 'count(d)': 0,\n",
       " 'count(e)': 0,\n",
       " 'count(f)': 0,\n",
       " 'count(g)': 0,\n",
       " 'count(h)': 1,\n",
       " 'count(i)': 0,\n",
       " 'count(j)': 1,\n",
       " 'count(k)': 0,\n",
       " 'count(l)': 0,\n",
       " 'count(m)': 0,\n",
       " 'count(n)': 1,\n",
       " 'count(o)': 1,\n",
       " 'count(p)': 0,\n",
       " 'count(q)': 0,\n",
       " 'count(r)': 0,\n",
       " 'count(s)': 0,\n",
       " 'count(t)': 0,\n",
       " 'count(u)': 0,\n",
       " 'count(v)': 0,\n",
       " 'count(w)': 0,\n",
       " 'count(x)': 0,\n",
       " 'count(y)': 0,\n",
       " 'count(z)': 0,\n",
       " 'firstletter': 'j',\n",
       " 'has(a)': False,\n",
       " 'has(b)': False,\n",
       " 'has(c)': False,\n",
       " 'has(d)': False,\n",
       " 'has(e)': False,\n",
       " 'has(f)': False,\n",
       " 'has(g)': False,\n",
       " 'has(h)': True,\n",
       " 'has(i)': False,\n",
       " 'has(j)': True,\n",
       " 'has(k)': False,\n",
       " 'has(l)': False,\n",
       " 'has(m)': False,\n",
       " 'has(n)': True,\n",
       " 'has(o)': True,\n",
       " 'has(p)': False,\n",
       " 'has(q)': False,\n",
       " 'has(r)': False,\n",
       " 'has(s)': False,\n",
       " 'has(t)': False,\n",
       " 'has(u)': False,\n",
       " 'has(v)': False,\n",
       " 'has(w)': False,\n",
       " 'has(x)': False,\n",
       " 'has(y)': False,\n",
       " 'has(z)': False,\n",
       " 'lastletter': 'n'}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#  一个特征提取器，过拟合性别特征。这个特征提取器返回的特征集包括大量指\n",
    "# 定的特征，从而导致对于相对较小的名字语料库过拟合。\n",
    "\n",
    "def gender_features2(name):\n",
    "    features = {}\n",
    "    features[\"firstletter\"] = name[0].lower()\n",
    "    features[\"lastletter\"] = name[-1].lower()\n",
    "    for letter in 'abcdefghijklmnopqrstuvwxyz':\n",
    "        features[\"count(%s)\" % letter] = name.lower().count(letter)\n",
    "        features[\"has(%s)\" % letter] = (letter in name.lower())\n",
    "    return features\n",
    "gender_features2('John')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.804"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "featuresets = [(gender_features2(n), g) for (n,g) in names]\n",
    "train_set, test_set = featuresets[500:], featuresets[:500]\n",
    "classifier = nltk.NaiveBayesClassifier.train(train_set)\n",
    "nltk.classify.accuracy(classifier, test_set)\n",
    "\n",
    "\n",
    "#  尴尬了"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  错误分析"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "train_names = names[1500:]\n",
    "devtest_names = names[500:1500]\n",
    "test_names = names[:500]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.752"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_set = [(gender_features(n), g) for (n,g) in train_names]\n",
    "devtest_set = [(gender_features(n), g) for (n,g) in devtest_names]\n",
    "test_set = [(gender_features(n), g) for (n,g) in test_names]\n",
    "classifier = nltk.NaiveBayesClassifier.train(train_set)\n",
    "nltk.classify.accuracy(classifier, devtest_set) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "errors = []\n",
    "for (name, tag) in devtest_names:\n",
    "    guess = classifier.classify(gender_features(name))\n",
    "    if guess != tag:\n",
    "        errors.append( (tag, guess, name) )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "correct=female   guess=male     name=Aimil                         \n",
      "correct=female   guess=male     name=Alisun                        \n",
      "correct=female   guess=male     name=Allison                       \n",
      "correct=female   guess=male     name=Allyn                         \n",
      "correct=female   guess=male     name=Allyson                       \n",
      "correct=female   guess=male     name=Alyss                         \n",
      "correct=female   guess=male     name=Anett                         \n",
      "correct=female   guess=male     name=Annabal                       \n",
      "correct=female   guess=male     name=Ardys                         \n",
      "correct=female   guess=male     name=Aryn                          \n",
      "correct=female   guess=male     name=Beret                         \n",
      "correct=female   guess=male     name=Blair                         \n",
      "correct=female   guess=male     name=Brandais                      \n",
      "correct=female   guess=male     name=Calypso                       \n",
      "correct=female   guess=male     name=Carlynn                       \n",
      "correct=female   guess=male     name=Caroljean                     \n",
      "correct=female   guess=male     name=Carolyn                       \n",
      "correct=female   guess=male     name=Carolynn                      \n",
      "correct=female   guess=male     name=Carrol                        \n",
      "correct=female   guess=male     name=Catherin                      \n",
      "correct=female   guess=male     name=Cathrin                       \n",
      "correct=female   guess=male     name=Charis                        \n",
      "correct=female   guess=male     name=Charlott                      \n",
      "correct=female   guess=male     name=Chloris                       \n",
      "correct=female   guess=male     name=Clem                          \n",
      "correct=female   guess=male     name=Colleen                       \n",
      "correct=female   guess=male     name=Consuelo                      \n",
      "correct=female   guess=male     name=Dallas                        \n",
      "correct=female   guess=male     name=Danell                        \n",
      "correct=female   guess=male     name=Darb                          \n",
      "correct=female   guess=male     name=Daveen                        \n",
      "correct=female   guess=male     name=Dawn                          \n",
      "correct=female   guess=male     name=Debor                         \n",
      "correct=female   guess=male     name=Deeann                        \n",
      "correct=female   guess=male     name=Diahann                       \n",
      "correct=female   guess=male     name=Donnajean                     \n",
      "correct=female   guess=male     name=Doralynn                      \n",
      "correct=female   guess=male     name=Drew                          \n",
      "correct=female   guess=male     name=Edin                          \n",
      "correct=female   guess=male     name=Ellyn                         \n",
      "correct=female   guess=male     name=Emmalyn                       \n",
      "correct=female   guess=male     name=Eran                          \n",
      "correct=female   guess=male     name=Esther                        \n",
      "correct=female   guess=male     name=Ethelin                       \n",
      "correct=female   guess=male     name=Flor                          \n",
      "correct=female   guess=male     name=Flower                        \n",
      "correct=female   guess=male     name=Gen                           \n",
      "correct=female   guess=male     name=Gert                          \n",
      "correct=female   guess=male     name=Gilligan                      \n",
      "correct=female   guess=male     name=Gladis                        \n",
      "correct=female   guess=male     name=Greer                         \n",
      "correct=female   guess=male     name=Gretchen                      \n",
      "correct=female   guess=male     name=Gwyn                          \n",
      "correct=female   guess=male     name=Heather                       \n",
      "correct=female   guess=male     name=Honor                         \n",
      "correct=female   guess=male     name=Ingrid                        \n",
      "correct=female   guess=male     name=Jacklin                       \n",
      "correct=female   guess=male     name=Jan                           \n",
      "correct=female   guess=male     name=Janifer                       \n",
      "correct=female   guess=male     name=Jasmin                        \n",
      "correct=female   guess=male     name=Jean                          \n",
      "correct=female   guess=male     name=Jesselyn                      \n",
      "correct=female   guess=male     name=Jo                            \n",
      "correct=female   guess=male     name=Jo-Ann                        \n",
      "correct=female   guess=male     name=Joann                         \n",
      "correct=female   guess=male     name=Joslyn                        \n",
      "correct=female   guess=male     name=Kaitlin                       \n",
      "correct=female   guess=male     name=Karel                         \n",
      "correct=female   guess=male     name=Karlyn                        \n",
      "correct=female   guess=male     name=Kellen                        \n",
      "correct=female   guess=male     name=Kerstin                       \n",
      "correct=female   guess=male     name=Kim                           \n",
      "correct=female   guess=male     name=Kym                           \n",
      "correct=female   guess=male     name=Leeann                        \n",
      "correct=female   guess=male     name=Lyn                           \n",
      "correct=female   guess=male     name=Lyndel                        \n",
      "correct=female   guess=male     name=Manon                         \n",
      "correct=female   guess=male     name=Mariam                        \n",
      "correct=female   guess=male     name=Marion                        \n",
      "correct=female   guess=male     name=Marit                         \n",
      "correct=female   guess=male     name=Maryellen                     \n",
      "correct=female   guess=male     name=Mavis                         \n",
      "correct=female   guess=male     name=Mead                          \n",
      "correct=female   guess=male     name=Meggan                        \n",
      "correct=female   guess=male     name=Meghann                       \n",
      "correct=female   guess=male     name=Meriel                        \n",
      "correct=female   guess=male     name=Michal                        \n",
      "correct=female   guess=male     name=Morgen                        \n",
      "correct=female   guess=male     name=Muffin                        \n",
      "correct=female   guess=male     name=Murial                        \n",
      "correct=female   guess=male     name=Muriel                        \n",
      "correct=female   guess=male     name=Nan                           \n",
      "correct=female   guess=male     name=Noellyn                       \n",
      "correct=female   guess=male     name=Noreen                        \n",
      "correct=female   guess=male     name=Olwen                         \n",
      "correct=female   guess=male     name=Orel                          \n",
      "correct=female   guess=male     name=Perl                          \n",
      "correct=female   guess=male     name=Persis                        \n",
      "correct=female   guess=male     name=Phillis                       \n",
      "correct=female   guess=male     name=Rahal                         \n",
      "correct=female   guess=male     name=Rosalynd                      \n",
      "correct=female   guess=male     name=Roselyn                       \n",
      "correct=female   guess=male     name=Roz                           \n",
      "correct=female   guess=male     name=Saraann                       \n",
      "correct=female   guess=male     name=Shamit                        \n",
      "correct=female   guess=male     name=Shaylyn                       \n",
      "correct=female   guess=male     name=Shir                          \n",
      "correct=female   guess=male     name=Sileas                        \n",
      "correct=female   guess=male     name=Sinead                        \n",
      "correct=female   guess=male     name=Starlin                       \n",
      "correct=female   guess=male     name=Stoddard                      \n",
      "correct=female   guess=male     name=Tess                          \n",
      "correct=female   guess=male     name=Winifred                      \n",
      "correct=male     guess=female   name=Alfonse                       \n",
      "correct=male     guess=female   name=Allah                         \n",
      "correct=male     guess=female   name=Arnie                         \n",
      "correct=male     guess=female   name=Arvie                         \n",
      "correct=male     guess=female   name=Avi                           \n",
      "correct=male     guess=female   name=Barnabe                       \n",
      "correct=male     guess=female   name=Barnaby                       \n",
      "correct=male     guess=female   name=Beale                         \n",
      "correct=male     guess=female   name=Berke                         \n",
      "correct=male     guess=female   name=Billie                        \n",
      "correct=male     guess=female   name=Blaine                        \n",
      "correct=male     guess=female   name=Bradly                        \n",
      "correct=male     guess=female   name=Brandy                        \n",
      "correct=male     guess=female   name=Broddie                       \n",
      "correct=male     guess=female   name=Butch                         \n",
      "correct=male     guess=female   name=Carlyle                       \n",
      "correct=male     guess=female   name=Chase                         \n",
      "correct=male     guess=female   name=Chaunce                       \n",
      "correct=male     guess=female   name=Clarence                      \n",
      "correct=male     guess=female   name=Claybourne                    \n",
      "correct=male     guess=female   name=Connolly                      \n",
      "correct=male     guess=female   name=Corby                         \n",
      "correct=male     guess=female   name=Corrie                        \n",
      "correct=male     guess=female   name=Dani                          \n",
      "correct=male     guess=female   name=Danny                         \n",
      "correct=male     guess=female   name=Dave                          \n",
      "correct=male     guess=female   name=Davie                         \n",
      "correct=male     guess=female   name=Denny                         \n",
      "correct=male     guess=female   name=Dietrich                      \n",
      "correct=male     guess=female   name=Dimitry                       \n",
      "correct=male     guess=female   name=Dmitri                        \n",
      "correct=male     guess=female   name=Duffie                        \n",
      "correct=male     guess=female   name=Eddy                          \n",
      "correct=male     guess=female   name=Emile                         \n",
      "correct=male     guess=female   name=Emmy                          \n",
      "correct=male     guess=female   name=Emory                         \n",
      "correct=male     guess=female   name=Filmore                       \n",
      "correct=male     guess=female   name=Freddy                        \n",
      "correct=male     guess=female   name=French                        \n",
      "correct=male     guess=female   name=Garcia                        \n",
      "correct=male     guess=female   name=Garvey                        \n",
      "correct=male     guess=female   name=Geoffrey                      \n",
      "correct=male     guess=female   name=Geri                          \n",
      "correct=male     guess=female   name=Godfree                       \n",
      "correct=male     guess=female   name=Grace                         \n",
      "correct=male     guess=female   name=Graehme                       \n",
      "correct=male     guess=female   name=Greggory                      \n",
      "correct=male     guess=female   name=Gregory                       \n",
      "correct=male     guess=female   name=Haleigh                       \n",
      "correct=male     guess=female   name=Harry                         \n",
      "correct=male     guess=female   name=Heath                         \n",
      "correct=male     guess=female   name=Helmuth                       \n",
      "correct=male     guess=female   name=Henri                         \n",
      "correct=male     guess=female   name=Herby                         \n",
      "correct=male     guess=female   name=Hercule                       \n",
      "correct=male     guess=female   name=Hersch                        \n",
      "correct=male     guess=female   name=Jackie                        \n",
      "correct=male     guess=female   name=Jean-Christophe               \n",
      "correct=male     guess=female   name=Jeffie                        \n",
      "correct=male     guess=female   name=Jeramie                       \n",
      "correct=male     guess=female   name=Jere                          \n",
      "correct=male     guess=female   name=Jodie                         \n",
      "correct=male     guess=female   name=Jordy                         \n",
      "correct=male     guess=female   name=Jorge                         \n",
      "correct=male     guess=female   name=Josh                          \n",
      "correct=male     guess=female   name=Kelly                         \n",
      "correct=male     guess=female   name=Kermie                        \n",
      "correct=male     guess=female   name=Knox                          \n",
      "correct=male     guess=female   name=Laurence                      \n",
      "correct=male     guess=female   name=Lefty                         \n",
      "correct=male     guess=female   name=Lorne                         \n",
      "correct=male     guess=female   name=Lyle                          \n",
      "correct=male     guess=female   name=Maddy                         \n",
      "correct=male     guess=female   name=Martie                        \n",
      "correct=male     guess=female   name=Morly                         \n",
      "correct=male     guess=female   name=Morry                         \n",
      "correct=male     guess=female   name=Murdoch                       \n",
      "correct=male     guess=female   name=Mustafa                       \n",
      "correct=male     guess=female   name=Nate                          \n",
      "correct=male     guess=female   name=Neale                         \n",
      "correct=male     guess=female   name=Neddy                         \n",
      "correct=male     guess=female   name=Niki                          \n",
      "correct=male     guess=female   name=Nikki                         \n",
      "correct=male     guess=female   name=Normie                        \n",
      "correct=male     guess=female   name=Ozzy                          \n",
      "correct=male     guess=female   name=Paddie                        \n",
      "correct=male     guess=female   name=Pasquale                      \n",
      "correct=male     guess=female   name=Phillipe                      \n",
      "correct=male     guess=female   name=Prentice                      \n",
      "correct=male     guess=female   name=Ramesh                        \n",
      "correct=male     guess=female   name=Randolph                      \n",
      "correct=male     guess=female   name=Reggie                        \n",
      "correct=male     guess=female   name=Ripley                        \n",
      "correct=male     guess=female   name=Roarke                        \n",
      "correct=male     guess=female   name=Ronnie                        \n",
      "correct=male     guess=female   name=Roy                           \n",
      "correct=male     guess=female   name=Rudy                          \n",
      "correct=male     guess=female   name=Rufe                          \n",
      "correct=male     guess=female   name=Sammie                        \n",
      "correct=male     guess=female   name=Sarge                         \n",
      "correct=male     guess=female   name=Sayre                         \n",
      "correct=male     guess=female   name=Serge                         \n",
      "correct=male     guess=female   name=Shelby                        \n",
      "correct=male     guess=female   name=Shorty                        \n",
      "correct=male     guess=female   name=Skippy                        \n",
      "correct=male     guess=female   name=Smith                         \n",
      "correct=male     guess=female   name=Solly                         \n",
      "correct=male     guess=female   name=Sonnie                        \n",
      "correct=male     guess=female   name=Sonny                         \n",
      "correct=male     guess=female   name=Stinky                        \n",
      "correct=male     guess=female   name=Sydney                        \n",
      "correct=male     guess=female   name=Tabby                         \n",
      "correct=male     guess=female   name=Taite                         \n",
      "correct=male     guess=female   name=Tammy                         \n",
      "correct=male     guess=female   name=Tannie                        \n",
      "correct=male     guess=female   name=Tate                          \n",
      "correct=male     guess=female   name=Teddy                         \n",
      "correct=male     guess=female   name=Timothy                       \n",
      "correct=male     guess=female   name=Tonnie                        \n",
      "correct=male     guess=female   name=Tore                          \n",
      "correct=male     guess=female   name=Tray                          \n",
      "correct=male     guess=female   name=Tremayne                      \n",
      "correct=male     guess=female   name=Troy                          \n",
      "correct=male     guess=female   name=Tulley                        \n",
      "correct=male     guess=female   name=Uriah                         \n",
      "correct=male     guess=female   name=Verge                         \n",
      "correct=male     guess=female   name=Vijay                         \n",
      "correct=male     guess=female   name=Wally                         \n",
      "correct=male     guess=female   name=Waverly                       \n",
      "correct=male     guess=female   name=Willi                         \n",
      "correct=male     guess=female   name=Witty                         \n",
      "correct=male     guess=female   name=Wojciech                      \n",
      "correct=male     guess=female   name=Wolfy                         \n",
      "correct=male     guess=female   name=Worthy                        \n",
      "correct=male     guess=female   name=Yardley                       \n"
     ]
    }
   ],
   "source": [
    "for (tag, guess, name) in sorted(errors): # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE\n",
    "    print('correct=%-8s guess=%-8s name=%-30s'%((tag, guess, name)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.783"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"\"\"\n",
    "浏览这个错误列表，它明确指出一些多个字母的后缀可以指示名字性别。例如：yn 结\n",
    "尾的名字显示以女性为主，尽管事实上，n 结尾的名字往往是男性；以 ch 结尾的名字通常\n",
    "是男性，尽管以 h 结尾的名字倾向于是女性。因此，调整我们的特征提取器包括两个字母后缀的特征\n",
    "\"\"\"\n",
    "def gender_features(word):\n",
    "    return {'suffix1': word[-1:],\n",
    "            'suffix2': word[-2:]}\n",
    "train_set = [(gender_features(n), g) for (n,g) in train_names]\n",
    "devtest_set = [(gender_features(n), g) for (n,g) in devtest_names]\n",
    "classifier = nltk.NaiveBayesClassifier.train(train_set)\n",
    "nltk.classify.accuracy(classifier, devtest_set)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
